We are constantly consuming multiple genres and subgenres of music each day. Developing an understanding of this music is something widely underappreciated and accomplished, however, the programmers at spotify deem it a necessary feature that was to be integrated into their platform and business model. In doing this, they created an API open for developers to integrate their data into applications and analytics. Other companies like Genius and LastFM have created APIs as well to both bring lyrics and genre tags respectively into the developer’s hands.
features that explain the mood, feeling, or characteristics of tracks. This is a fairly vast dataset of songs and song features that open up a lot of possibility for analysis. Some questions that could be asked are: How does music happiness change across the world? How does happiness relate to other key features in music? Can we form a model that shows this relationship? This and further exporations found at this repo will explore this and more.#plotting and wrangling
library(tidyverse)
library(highcharter)
library(factoextra)
library(patchwork)
library(sf)
#ml/modeling
library(caret)
#working with time variables
library(lubridate)
#kable
library(kableExtra)
#setting the seed for reproducability
set.seed(543)
energy and valence from the last decade?From the Spotify API, there are a number of measures defining the features found in music. These features are as follows (taken from the official documentation):
**acousticness**: A confidence measure from 0.0 to 1.0 of whether the track is acoustic.
**danceability**: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.
**energy**: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.
**instrumentalness**: Predicts whether a track contains no vocals.
**liveness**: Detects the presence of an audience in the recording.
**loudness**: The overall loudness of a track in decibels (dB).
**speechiness**: Speechiness detects the presence of spoken words in a track.
**valence**: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
One might think from the descriptions, that energy and valence must be related. Let’s plot them all on a graph to see.
Being able to see how happy certain albums are is nice, but what about albums across the globe? How does the average happiness of music change as we move overseas?
There are a few parameters in the Spotify API that are computed as functions of other parameters found in the data. These parameters are energy, loudness, and acousticness. For example, loudness is used in the calculations of energy. For this reason, these terms will be modeled as interaction terms in the regression model. The plots demonstrating these relationships are omitted here for this reason as well. The documentation expounding on these parameters is found here.
valence, as well as how well they can be used for an inferential model.##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.51752 -0.12028 -0.01606 0.11570 0.64027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.357405 0.094252 -3.792 0.000157 ***
## danceability 0.511593 0.034168 14.973 < 2e-16 ***
## energy 0.653016 0.113099 5.774 1.00e-08 ***
## loudness -0.013510 0.008845 -1.527 0.126948
## acousticness 0.302115 0.120668 2.504 0.012432 *
## speechiness 0.188917 0.047946 3.940 8.64e-05 ***
## instrumentalness -0.041068 0.022154 -1.854 0.064040 .
## liveness 0.052949 0.037499 1.412 0.158218
## `energy:loudness` 0.019084 0.012955 1.473 0.140994
## `energy:acousticness` -0.148657 0.187236 -0.794 0.427390
## `loudness:acousticness` 0.011871 0.010295 1.153 0.249084
## `energy:loudness:acousticness` 0.003036 0.016958 0.179 0.857961
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1844 on 1125 degrees of freedom
## Multiple R-squared: 0.343, Adjusted R-squared: 0.3365
## F-statistic: 53.39 on 11 and 1125 DF, p-value: < 2.2e-16
Danceability, energy, speechiness, the intercept, and acousticness to a lesser extent are significant predictors of valence. Even with the interaction terms, energy is incredibly with a p-value of 2e-16. Unfortunately, the parameters listed only explain about 34% of the variance in valence. This is due to some missing information and imperfect modeling for the individual parameters. From this, the coefficients seem to lead in the direction described above in regards to energy and valence. That is, if there were an increase of energy from one song to another, then there should be an expected increase of about 0.65% in valence.| (Intercept) | -0.3574047 |
| danceability | 0.5115930 |
| energy | 0.6530159 |
| loudness | -0.0135102 |
| acousticness | 0.3021149 |
| speechiness | 0.1889168 |
| instrumentalness | -0.0410682 |
| liveness | 0.0529490 |
energy:loudness
|
0.0190843 |
energy:acousticness
|
-0.1486566 |
loudness:acousticness
|
0.0118714 |
energy:loudness:acousticness
|
0.0030356 |
The model performs reasonably well. As stated above, the low R-squared value is telling that there is some important data missing from this analysis to predict further. This could be anything from more features to artists, album, and genre metadata. The latter are available in the current dataset I have created for this analysis and will be investigated in future analyses. Other plots that I wanted to include here were, results from an out of sample analyses of the data, more exploratory analysis on individual artists and their albums, more maps showing different parameters, and a shiny app developing these plots into more interactive visualizations in which the parameters being shown/compared can be changed and adjusted. This does not fit into the scope of this preliminary analysis and was thus not included here.